climate <- read.csv ("climate_spending.csv", header = TRUE)
library(ggplot2)

Read the climate data

summary(climate)
##            department      year       gcc_spending      
##  Agriculture    :18   Min.   :2000   Min.   :3.113e+07  
##  All Other      :18   1st Qu.:2004   1st Qu.:7.604e+07  
##  Commerce (NOAA):18   Median :2008   Median :1.552e+08  
##  Energy         :18   Mean   :2008   Mean   :3.465e+08  
##  Interior       :18   3rd Qu.:2013   3rd Qu.:3.209e+08  
##  NASA           :18   Max.   :2017   Max.   :1.676e+09  
##  NSF            :18

Make sure the data

attach (climate)
names (climate)
## [1] "department"   "year"         "gcc_spending"

Plot the data

ggplot by the year(x) and gcc_spending (y) plotting by point:

ggplot(climate, aes(x = year, y =gcc_spending, color = department)) +
  geom_point()

The gcc_spending from NASA department has displayed that high value compare between the other department, the data from 2000 to 2017 has fluctiative and the highest showed between 2000 to 2003

ggplot by the year(x) and gcc_spending (y) plotting by boxplot:

ggplot(climate, aes(x = year, y =gcc_spending, color = department)) +
  geom_boxplot()

The figure of the box plot showed that any big diffrences of the data on the NASA Department on 2014. interestingly, that the small differences of the data by Interior Department on 2011.

ggplot by the year(x) and gcc_spending (y) plotting by line:

ggplot(climate, aes(x = year, y =gcc_spending, color = department)) +
  geom_line()

Plotting the climate data sort by department and time series by year. the figure showed that the gcc_spending has fluctuative by the all department, but NASA Department showed the high value compare to the other Department.

log by gcc_spending

df=climate
df$lngcc_spending = log(df$gcc_spending)

summary

summary(df)
##            department      year       gcc_spending       lngcc_spending 
##  Agriculture    :18   Min.   :2000   Min.   :3.113e+07   Min.   :17.25  
##  All Other      :18   1st Qu.:2004   1st Qu.:7.604e+07   1st Qu.:18.15  
##  Commerce (NOAA):18   Median :2008   Median :1.552e+08   Median :18.86  
##  Energy         :18   Mean   :2008   Mean   :3.465e+08   Mean   :19.02  
##  Interior       :18   3rd Qu.:2013   3rd Qu.:3.209e+08   3rd Qu.:19.59  
##  NASA           :18   Max.   :2017   Max.   :1.676e+09   Max.   :21.24  
##  NSF            :18

plotting log linear model by the department based on the time series (year):

ggplot(climate, aes(x = year, y = gcc_spending, color = factor(year)))+
  geom_point() + scale_x_log10() + geom_smooth(method = "lm") + facet_wrap(~department)

plotting by the linear and the year as factor, showed that the NASA Department has a high value and fluctuative from 2000 - 2017.

plotting log GLM by the department based on the time series (year):

ggplot(climate, aes(x = year, y = gcc_spending, color = factor(year)))+
  geom_point() + scale_x_log10() + geom_smooth(method = "glm") + facet_wrap (~department)

plotting by the GLM has showed that the similar result with liner model plotting.

analysis Generalized Linear Model (GLM)

glm_climate <- glm(year ~ gcc_spending + department, family = gaussian, data = climate)

summary of the GLM (Genaralized Linear Model)

summary (glm_climate)
## 
## Call:
## glm(formula = year ~ gcc_spending + department, family = gaussian, 
##     data = climate)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -11.2060   -4.2741    0.4521    4.3017    9.1212  
## 
## Coefficients:
##                             Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                2.007e+03  1.348e+00 1488.739  < 2e-16 ***
## gcc_spending               1.698e-08  6.219e-09    2.730  0.00731 ** 
## departmentAll Other        7.815e-02  1.733e+00    0.045  0.96412    
## departmentCommerce (NOAA) -3.450e+00  2.145e+00   -1.608  0.11042    
## departmentEnergy          -1.596e+00  1.829e+00   -0.872  0.38473    
## departmentInterior         7.265e-01  1.753e+00    0.414  0.67938    
## departmentNASA            -2.277e+01  8.519e+00   -2.673  0.00859 ** 
## departmentNSF             -3.429e+00  2.141e+00   -1.602  0.11182    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 27.03431)
## 
##     Null deviance: 3391.5  on 125  degrees of freedom
## Residual deviance: 3190.0  on 118  degrees of freedom
## AIC: 782.74
## 
## Number of Fisher Scoring iterations: 2

the result of the GLM analysis showed that the significant by the NASA Departmen and gcc_spending, the start meaning is the significantly by the year.

Test Anova

anova(glm_climate)
## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: year
## 
## Terms added sequentially (first to last)
## 
## 
##              Df Deviance Resid. Df Resid. Dev
## NULL                           125     3391.5
## gcc_spending  1    5.318       124     3386.2
## department    6  196.134       118     3190.0

the test anova showed that the Df value for the gcc_spending higher than Dpeartment by the time series data (year)

Plot The GLM

plot (glm_climate)

Generalized linear model for the climate data was displayed that the significantly to the NASA department by year. GLM analysis is describe that the factor influencing to the variable.

Read the Energy Data

Read the CSV data

Read the CSV data from directory:

energy <- read.csv ("energy_spending.csv", header = TRUE)

Summary

summary(energy)
##                               department       year     
##  Adv Sci Comp Res*                 : 22   Min.   :1997  
##  Atomic Energy Defense             : 22   1st Qu.:2002  
##  Basic Energy Sciences*            : 22   Median :2008  
##  Bio and Env Research*             : 22   Mean   :2008  
##  Energy Efficiency and Renew Energy: 22   3rd Qu.:2013  
##  Fossil Energy                     : 22   Max.   :2018  
##  (Other)                           :110                 
##  energy_spending    
##  Min.   :5.690e+07  
##  1st Qu.:4.762e+08  
##  Median :6.948e+08  
##  Mean   :1.456e+09  
##  3rd Qu.:1.357e+09  
##  Max.   :7.574e+09  
## 

Attach and Str the Data

attach (energy)
## The following objects are masked from climate:
## 
##     department, year
str (energy)
## 'data.frame':    242 obs. of  3 variables:
##  $ department     : Factor w/ 11 levels "Adv Sci Comp Res*",..: 11 1 3 4 7 8 10 5 9 6 ...
##  $ year           : int  1997 1997 1997 1997 1997 1997 1997 1997 1997 1997 ...
##  $ energy_spending: num  3.59e+09 2.17e+08 9.33e+08 5.51e+08 3.31e+08 ...
names (energy)
## [1] "department"      "year"            "energy_spending"

Plot

plotting the data by point ggplot:

ggplot(energy, aes(x = year, y =energy_spending, color = department)) +
   geom_point()

The figure showed that the data for the energy spending has significantly increase from 1997 until 2018 by the Atomic Energy Defense Department. not only that Department, if we can see on the Adv. Sci Comp Res Department showed that increase as well, but honestly start from 2010 has decreased.

plotting the data by boxplot ggplot:

ggplot(energy, aes(x = year, y =energy_spending, color = department)) +
   geom_boxplot()

The figure showed that any big differences on the Atomic Energy Defense Department in the 2000 and also any differences by the Adv Sci Comp Res Department in the 2016 compare to the other Department by time series data from 1997 - 2018.

Plotting the data by line ggplot:

ggplot(energy, aes(x = year, y =energy_spending, color = department)) +
   geom_line()

The figure showed that the Atomic Energy Defense Department has fluctiative from 1997 until 2017 same as Atomic Energy Defense. but, both of it has high value of energy spending compare to the other Department. The data has recorded from 1997 until 2018.

plotting log linear model by the department based on the time series (year):

ggplot(energy, aes(x = year, y = energy_spending, color = factor(year)))+
  geom_point() + scale_x_log10() + geom_smooth(method = "lm") + facet_wrap(~department)

The figure showed that the Atomic Energy Defense Department has a high value of the energy spending over the year from 1997 - 2018. As same thet the Office of Science R&D Department has displayed that the value is high over the year start from 1997 until 2018

plotting log glm by the department based on the time series (year):

ggplot(energy, aes(x = year, y = energy_spending, color = factor(year)))+
  geom_point() + scale_x_log10() + geom_smooth(method = "glm") + facet_wrap (~department)

The figure displayed that not any difference with the lm ggplot as displayed above, the value is high showed on the 2 Department (Atomic Energy Defense and Office of Science R&D Department)

analysis Generalized Linear Model (GLM)

glm_energy <- glm(year ~ energy_spending + department, family = gaussian, data = energy)

summary of the GLM (Genaralized Linear Model)

summary (glm_energy)
## 
## Call:
## glm(formula = year ~ energy_spending + department, family = gaussian, 
##     data = energy)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -11.7463   -4.0891   -0.0892    4.2582   10.4473  
## 
## Coefficients:
##                                                Estimate Std. Error
## (Intercept)                                   2.004e+03  1.216e+00
## energy_spending                               8.968e-09  9.140e-10
## departmentAtomic Energy Defense              -4.227e+01  4.612e+00
## departmentBasic Energy Sciences*             -1.015e+01  1.945e+00
## departmentBio and Env Research*              -2.321e+00  1.664e+00
## departmentEnergy Efficiency and Renew Energy -6.038e+00  1.759e+00
## departmentFossil Energy                      -1.169e+00  1.652e+00
## departmentFusion Energy Sciences*            -7.648e-02  1.647e+00
## departmentHigh-Energy Physics*               -4.555e+00  1.712e+00
## departmentNuclear Energy                     -5.946e-01  1.649e+00
## departmentNuclear Physics*                   -1.384e+00  1.653e+00
## departmentOffice of Science R&D              -3.749e+01  4.161e+00
##                                               t value Pr(>|t|)    
## (Intercept)                                  1648.525  < 2e-16 ***
## energy_spending                                 9.812  < 2e-16 ***
## departmentAtomic Energy Defense                -9.165  < 2e-16 ***
## departmentBasic Energy Sciences*               -5.220    4e-07 ***
## departmentBio and Env Research*                -1.394 0.164521    
## departmentEnergy Efficiency and Renew Energy   -3.433 0.000707 ***
## departmentFossil Energy                        -0.708 0.479881    
## departmentFusion Energy Sciences*              -0.046 0.963015    
## departmentHigh-Energy Physics*                 -2.661 0.008330 ** 
## departmentNuclear Energy                       -0.361 0.718681    
## departmentNuclear Physics*                     -0.837 0.403358    
## departmentOffice of Science R&D                -9.011  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 29.85287)
## 
##     Null deviance: 9740.5  on 241  degrees of freedom
## Residual deviance: 6866.2  on 230  degrees of freedom
## AIC: 1522.4
## 
## Number of Fisher Scoring iterations: 2

The analysis of GLM showed that the significantly factor to the variable by the Department are Energy Defense Department, Basic Energy Science Department, Energy Efficiency and Renew Energy Department, Office of Science R&D Department and High-Energy Physics Department. The Stars meaning that any differences factor to the variable of the year.

Anova Test

anova(glm_energy)
## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: year
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev
## NULL                              241     9740.5
## energy_spending  1   152.03       240     9588.5
## department      10  2722.31       230     6866.2

The Anova test showed that the Df value is high to the Energy Spending than Department, also the Residual Deviasi for 2 factor that the energy spending has higher than Department value.

Plot GLM

plot (glm_energy)

Generalized Linear Model for analysis is to estimate the factor influence by the time series (year) from 1997 - 2018.

Read the RD Data

Read the CSV data

rd <- read.csv ("fed_r_d_spending.csv", header = TRUE)
library(ggplot2)

Summary

summary (rd)
##    department       year        rd_budget         total_outlays      
##  DHS    : 42   Min.   :1976   Min.   :0.000e+00   Min.   :3.718e+11  
##  DOC    : 42   1st Qu.:1986   1st Qu.:9.020e+08   1st Qu.:9.904e+11  
##  DOD    : 42   Median :1996   Median :1.888e+09   Median :1.581e+12  
##  DOE    : 42   Mean   :1996   Mean   :1.035e+10   Mean   :1.880e+12  
##  DOT    : 42   3rd Qu.:2007   3rd Qu.:1.206e+10   3rd Qu.:2.729e+12  
##  EPA    : 42   Max.   :2017   Max.   :9.432e+10   Max.   :3.982e+12  
##  (Other):336                                                         
##  discretionary_outlays      gdp           
##  Min.   :1.756e+11     Min.   :1.790e+12  
##  1st Qu.:4.385e+11     1st Qu.:4.536e+12  
##  Median :5.460e+11     Median :8.230e+12  
##  Mean   :6.942e+11     Mean   :9.175e+12  
##  3rd Qu.:1.042e+12     3rd Qu.:1.432e+13  
##  Max.   :1.347e+12     Max.   :1.918e+13  
## 

Attach the data

attach (rd)
## The following objects are masked from energy:
## 
##     department, year
## The following objects are masked from climate:
## 
##     department, year
str (rd)
## 'data.frame':    588 obs. of  6 variables:
##  $ department           : Factor w/ 14 levels "DHS","DOC","DOD",..: 3 9 4 7 10 11 13 8 5 6 ...
##  $ year                 : int  1976 1976 1976 1976 1976 1976 1976 1976 1976 1976 ...
##  $ rd_budget            : num  3.57e+10 1.25e+10 1.09e+10 9.23e+09 8.02e+09 ...
##  $ total_outlays        : num  3.72e+11 3.72e+11 3.72e+11 3.72e+11 3.72e+11 ...
##  $ discretionary_outlays: num  1.76e+11 1.76e+11 1.76e+11 1.76e+11 1.76e+11 ...
##  $ gdp                  : num  1.79e+12 1.79e+12 1.79e+12 1.79e+12 1.79e+12 ...
names (rd)
## [1] "department"            "year"                  "rd_budget"            
## [4] "total_outlays"         "discretionary_outlays" "gdp"

The type of the Variable

typeof (rd$rd_budget)
## [1] "double"
typeof (rd$total_outlays)
## [1] "double"
typeof (rd$discretionary_outlays)
## [1] "double"
typeof (rd$gdp)
## [1] "double"

Plotting

Plotting the data rd (x = year, y = rd_budget):

ggplot(rd, aes( x = year, y = rd_budget, color = total_outlays)) +
  geom_point()

The figure displayed that the rd_budget over time has increased from 1997 to 2018 by based on the total outylays inform that the total outlays has incrreasing as well over the time.

Plotting the data rd (x = year, y = rd_budget, based on the gdp):

ggplot(rd, aes( x = year, y = rd_budget, color = gdp)) +
  geom_point()

The figure dispyaed that the rd_budget has increased over time, as well as gdp has increased over time.

Plottting the data (x = year, y = rd_budget, based on the discreet outlays):

ggplot(rd, aes( x = year, y = rd_budget, color = discretionary_outlays)) +
  geom_point()

The figure showed that the rd_budget has increased over time based on the discretionary_outlays.

Plotting the data (x = year, y = rd_budget based on the Department):

ggplot(rd, aes( x = year, y = rd_budget, color = department)) +
  geom_point()

The figure displayed that based on the Department (DOD) has fluctuative over the time (1997 - 2018) and has a high value of the budget compare to the other Departement.

Plotting the data by linear model use the ggplot year and gdp:

ggplot(rd, aes(x = year, y =gdp)) +
  geom_point() + geom_smooth(method="lm") + facet_wrap (~department)

The figure displayed that based on the linear model showed the gdp has increase over time time based on the all Department.

Plotting the data by linear model use the ggplot year and budget:

ggplot(rd, aes(x = year, y =rd_budget)) +
  geom_point() + geom_smooth(method="lm") + facet_wrap (~department)

The figure showed that DOD department has fluctuative distribution over the time, we can see that the increase of the budget has high significant increase over the time by linear model, also we can see the high slope. NHS and NIH department are showed that increase significant compare than the other department, but the DOD Department has a high significantly increase over time.

Plotting the data by linear model use the ggplot year and total outlays:

ggplot(rd, aes(x = year, y =total_outlays)) +
  geom_point() + geom_smooth(method="lm") + facet_wrap (~department)

The figure displayed that based on the total outlays over the time has increase, and the data fit to the line by linear model, it means that the data over time has significantly increase on the all Department over the time.

Plotting the data by linear model use the ggplot year and discretionary outlays:

ggplot(rd, aes(x = year, y =discretionary_outlays)) +
  geom_point() + geom_smooth(method="lm") + facet_wrap (~department)

The figure showed that over the time based on the all Department by the discretionary outlays has increased and the linear model has fitted to the line, it means that the data has increased significantly.

Plotting the data by glm on the ggplot over the time:

ggplot(rd, aes(x = year, y = total_outlays, color = gdp))+
  geom_point() + scale_x_log10() + geom_smooth(method = "glm")

The figure dispyaed that the total overlays data over the time has increased. The model has fitted to the data, it means that the data has increased significantly over the time.

Plotting the data by glm on the ggplot over the time:

ggplot(rd, aes(x = year, y = total_outlays, color = rd_budget))+
  geom_point() + scale_x_log10() + geom_smooth(method = "glm")

The figure showed that the rd budget has a low over time based on the color, but the total outlays has increased over the time. The data fitted to the line by glm, it means that the data has increased significantly over the time.

GLM (Generalized Linear Model) Analysis

The GLM analysis is to assess what is the factor influence by the total outlays: m

glm_rd <- glm(total_outlays ~ year + rd_budget + department + gdp + discretionary_outlays, family = gaussian, data = rd)

Summary

summary (glm_rd)
## 
## Call:
## glm(formula = total_outlays ~ year + rd_budget + department + 
##     gdp + discretionary_outlays, family = gaussian, data = rd)
## 
## Deviance Residuals: 
##        Min          1Q      Median          3Q         Max  
## -2.647e+11  -2.075e+10   1.154e+10   4.934e+10   3.322e+11  
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.142e+13  5.447e+12   5.769 1.31e-08 ***
## year                  -1.599e+10  2.758e+09  -5.800 1.10e-08 ***
## rd_budget             -2.598e+00  7.958e-01  -3.265  0.00116 ** 
## departmentDOC          2.215e+09  2.324e+10   0.095  0.92411    
## departmentDOD          1.671e+11  5.620e+10   2.973  0.00308 ** 
## departmentDOE          2.989e+10  2.497e+10   1.197  0.23182    
## departmentDOT          1.400e+09  2.324e+10   0.060  0.95198    
## departmentEPA          9.650e+08  2.323e+10   0.042  0.96688    
## departmentHHS          5.694e+10  2.905e+10   1.960  0.05047 .  
## departmentInterior     1.355e+09  2.323e+10   0.058  0.95351    
## departmentNASA         3.056e+10  2.505e+10   1.220  0.22297    
## departmentNIH          5.388e+10  2.850e+10   1.891  0.05917 .  
## departmentNSF          9.508e+09  2.341e+10   0.406  0.68482    
## departmentOther        2.899e+09  2.325e+10   0.125  0.90082    
## departmentUSDA         5.201e+09  2.329e+10   0.223  0.82335    
## departmentVA           9.220e+08  2.323e+10   0.040  0.96836    
## gdp                    1.489e-01  7.184e-03  20.722  < 2e-16 ***
## discretionary_outlays  1.472e+00  4.689e-02  31.396  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.133344e+22)
## 
##     Null deviance: 7.1508e+26  on 587  degrees of freedom
## Residual deviance: 6.4601e+24  on 570  degrees of freedom
## AIC: 31548
## 
## Number of Fisher Scoring iterations: 2

The result showed that the factor influences significantly are year, rd budget, department DOC, gdp and discreetionary outlays. The start meaning that has significantly influences to the variable.

Plotting the GLM

plotting the GLM result :

plot(glm_rd)

Anova Test

anova(glm_rd)
## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: total_outlays
## 
## Terms added sequentially (first to last)
## 
## 
##                       Df   Deviance Resid. Df Resid. Dev
## NULL                                      587 7.1508e+26
## year                   1 6.7994e+26       586 3.5138e+25
## rd_budget              1 2.9410e+20       585 3.5138e+25
## department            13 2.6820e+21       572 3.5136e+25
## gdp                    1 1.7504e+25       571 1.7631e+25
## discretionary_outlays  1 1.1171e+25       570 6.4601e+24

The analysis of anova showed that the year has a high value for the Df and the residuls deviance showed that gdp has low residuals. The high residuals Deviance value is discretionary outlays.